Search results for "Data stream"
showing 10 items of 50 documents
Hyperspectral dimensionality reduction for biophysical variable statistical retrieval
2017
Abstract Current and upcoming airborne and spaceborne imaging spectrometers lead to vast hyperspectral data streams. This scenario calls for automated and optimized spectral dimensionality reduction techniques to enable fast and efficient hyperspectral data processing, such as inferring vegetation properties. In preparation of next generation biophysical variable retrieval methods applicable to hyperspectral data, we present the evaluation of 11 dimensionality reduction (DR) methods in combination with advanced machine learning regression algorithms (MLRAs) for statistical variable retrieval. Two unique hyperspectral datasets were analyzed on the predictive power of DR + MLRA methods to ret…
Summarizing the state of the terrestrial biosphere in few dimensions
2020
Abstract. In times of global change, we must closely monitor the state of the planet in order to understand the full complexity of these changes. In fact, each of the Earth's subsystems – i.e., the biosphere, atmosphere, hydrosphere, and cryosphere – can be analyzed from a multitude of data streams. However, since it is very hard to jointly interpret multiple monitoring data streams in parallel, one often aims for some summarizing indicator. Climate indices, for example, summarize the state of atmospheric circulation in a region. Although such approaches are also used in other fields of science, they are rarely used to describe land surface dynamics. Here, we propose a robust method to crea…
A Methodology to Derive Global Maps of Leaf Traits Using Remote Sensing and Climate Data
2018
This paper introduces a modular processing chain to derive global high-resolution maps of leaf traits. In particular, we present global maps at 500 m resolution of specific leaf area, leaf dry matter content, leaf nitrogen and phosphorus content per dry mass, and leaf nitrogen/phosphorus ratio. The processing chain exploits machine learning techniques along with optical remote sensing data (MODIS/Landsat) and climate data for gap filling and up-scaling of in-situ measured leaf traits. The chain first uses random forests regression with surrogates to fill gaps in the database (> 45% of missing entries) and maximizes the global representativeness of the trait dataset. Plant species are then a…
Towards Quantifying Non-Photosynthetic Vegetation for Agriculture Using Spaceborne Imaging Spectroscopy
2021
Non-photosynthetic vegetation (NPV) has been identified as priority variable in the context of new spaceborne imaging spectroscopy missions. In this study we provide a first attempt to quantify NPV biomass from these unprecedented data streams to be provided by multiple recently launched or planned instruments. A hybrid workflow is proposed including Gaussian process regression (GPR) trained over radiative transfer model (RTM) simulations and applying active learning strategies. A soybean field data set including two dates with NPV measurements on yellow and senescent (brown) plant organs was used for model validation, resulting in relative errors of 13.4%. This prototype retrieval model wa…
Earth system data cubes unravel global multivariate dynamics
2020
Understanding Earth system dynamics in light of ongoing human intervention and dependency remains a major scientific challenge. The unprecedented availability of data streams describing different facets of the Earth now offers fundamentally new avenues to address this quest. However, several practical hurdles, especially the lack of data interoperability, limit the joint potential of these data streams. Today, many initiatives within and beyond the Earth system sciences are exploring new approaches to overcome these hurdles and meet the growing interdisciplinary need for data-intensive research; using data cubes is one promising avenue. Here, we introduce the concept of Earth system data cu…
On the Classification of Dynamical Data Streams Using Novel “Anti–Bayesian” Techniques
2018
The classification of dynamical data streams is among the most complex problems encountered in classification. This is, firstly, because the distribution of the data streams is non-stationary, and it changes without any prior “warning”. Secondly, the manner in which it changes is also unknown. Thirdly, and more interestingly, the model operates with the assumption that the correct classes of previously-classified patterns become available at a juncture after their appearance. This paper pioneers the use of unreported novel schemes that can classify such dynamical data streams by invoking the recently-introduced “Anti- Bayesian” (AB) techniques. Contrary to the Bayesian paradigm, that compar…
Sequential Mining Classification
2017
Sequential pattern mining is a data mining technique that aims to extract and analyze frequent subsequences from sequences of events or items with time constraint. Sequence data mining was introduced in 1995 with the well-known Apriori algorithm. The algorithm studied the transactions through time, in order to extract frequent patterns from the sequences of products related to a customer. Later, this technique became useful in many applications: DNA researches, medical diagnosis and prevention, telecommunications, etc. GSP, SPAM, SPADE, PrefixSPan and other advanced algorithms followed. View the evolution of data mining techniques based on sequential data, this paper discusses the multiple …
Modeling Multi-label Recurrence in Data Streams
2019
Most of the existing data stream algorithms assume a single label as the target variable. However, in many applications, each observation is assigned to several labels with latent dependencies among them, which their target function may change over time. Classification of such non-stationary multi-label streaming data with the consideration of dependencies among labels and potential drifts is a challenging task. The few existing studies mostly cope with drifts implicitly, and all learn models on the original label space, which requires a lot of time and memory. None of them consider recurrent drifts in multi-label streams and particularly drifts and recurrences visible in a latent label spa…
Structural clustering of millions of molecular graphs
2014
We propose an algorithm for clustering very large molecular graph databases according to scaffolds (i.e., large structural overlaps) that are common between cluster members. Our approach first partitions the original dataset into several smaller datasets using a greedy clustering approach named APreClus based on dynamic seed clustering. APreClus is an online and instance incremental clustering algorithm delaying the final cluster assignment of an instance until one of the so-called pending clusters the instance belongs to has reached significant size and is converted to a fixed cluster. Once a cluster is fixed, APreClus recalculates the cluster centers, which are used as representatives for…
On the Online Classification of Data Streams Using Weak Estimators
2016
In this paper, we propose a novel online classifier for complex data streams which are generated from non-stationary stochastic properties. Instead of using a single training model and counters to keep important data statistics, the introduced online classifier scheme provides a real-time self-adjusting learning model. The learning model utilizes the multiplication-based update algorithm of the Stochastic Learning Weak Estimator (SLWE) at each time instant as a new labeled instance arrives. In this way, the data statistics are updated every time a new element is inserted, without requiring that we have to rebuild its model when changes occur in the data distributions. Finally, and most impo…